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ARTICLE INFO ABSTRACT 

Keywords: Introduction: Despite the assumed importance of school-focused possible identities for academic 
Possible selves motivation and outcomes, interventions rarely assess the effect of intervention on possible 
Possible identities identities. This may be due to difficulty coding open-ended text at scale but leaves open a number 
GPA 


of questions: 1) how do school-focused possible identities change over the course of the school 
year, 2) whether these changes are associated with changes in school outcomes, and 3) whether a 
machine coding approach is viable. 

Methods: In Study 1 (n = 247 Chicago 8th-graders) we assess fall-to-spring change in school- 
focused possible identities. We test whether change in school-focused possible identities predicts 
8th-grade academic outcomes. We include robustness checks. Then we examine school context 
effects. In Study 2 (n = 1006 Chicago 8th-graders) we address the problem of coding at scale, 
using a separate data set to train a machine-learning algorithm. 

Results: On average, school-focused possible identities decline over the school year. But nearly a 
third of students have increasing school-focused possible identity scores. Increase is associated 
with improved grades. School context influences whether linked strategies matter. Our machine- 
learning algorithm accurately classifies school-focused possible identities in our original sample 
and this school-focused classification reliably predicts academic trajectories. 

Conclusions: Change in school-focused possible identities is normative over the course of the 
school year, interventions should take this into account. On average, students have fewer school- 
focused possible identities by spring. This decline is associated with declining academic trajec- 
tories. However, when school-focused possible identities increase, so do grades. Whether stra- 
tegies matter is context dependent. 


Machine coding 
Identity-based motivation 


September:” [Next year I expect to be]... helpful (Whenever my friends ask me a question about the homework or about a class 
assignment and try to explain more.) paying more attention (This year I have been taking more notes and writing down anything I 
need to to help me understand what we are working on.) unstoppable (I'm trying to do anything I possibly can to get the best 
grades and to graduate. Next year I will do the same and no one will stop me from achieving my goals.) less distracted (I'm 
focusing more on my work and asking any questions I need to.)” ...... “[Next year I want to avoid]... trouble (Instead of for 
example laughing in class like I always do, I will do my best to be more serious and do nothing bad.) detention (I have never got 
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detention and I will still try to avoid that because I do not want that affecting me.) fights (Instead of arguing with my friends I will 
try to reason with them and not let anything get physical and no matter what anyone says to me I'll talk to them instead of being 
physical.) bad grades (I have gotten bad grades last year and I know that really affected me and my future so this year and next 
year I will do my best to get good grades.) 


May: “[Next year, I expect to be]... successful (Focusing on my classwork and focusing on how to improve my grades.) peaceful”... 
“[Next year I want to avoid]... chaos (Ignore people who want to start something.) fakes (Ignore all fake people, just say hi and 
bye.) drama (Ignore all rumors.) fights (Make the right choice and step off.)” 


— The full response of a female 8th-grader in September and May to probes [in square brackets] about her possible identities and 
(in parenthesis) her strategies to work toward expected and to avoid becoming like her feature possible identities. 


Interventions to improve academic outcomes often assume that the way a student imagines herself over the coming year matters 
for how much she engages with schoolwork. In the current paper we address three core gaps in the literature on these possible future 
identities, each of which is revealed in our opening quote. First, students have rich images of their possible identities—the identities 
they might attain for the coming year and they can describe an array of strategies—things they are doing now to work on these 
possible identities, but coding this rich idiographic text is difficult. Second, possible identities and strategies may change over the 
course of the school year, but as we describe below, the empirical literature has yet to provide a guideline regarding the likelihood of 
change or stability over the course of the school year. Third, though the literature describes an association between possible identities 
and academic outcomes, the empirical literature has yet to provide a guideline regarding the effect of upward or downward tra- 
jectories of change in possible identities on academic outcomes. These knowledge gaps are important for a number of reasons, as 
detailed next. 

First, lack of an efficient mechanism for coding rich ideographic data hampers advances in understanding which aspects of 
possible identities matter for motivation and behavior. Second, lack of knowledge regarding change and stability in possible identities 
over the course of the academic year means that theories about the process by which possible identities matter are necessarily vague. 
For example, given the current literature, it may be that possible identities sustain motivation, feeding hope for the future, when they 
are stable. But the reverse may also be true—possible identities may sustain motivation when they are likely to change. Third, lack of 
knowledge regarding the effect of change or stability in possible identities on future academic outcomes means that theories of 
change—the process models on which interventions are based—are also necessarily vague. Yet such change (e.g., an increase in 
school-focused possible identities, an increase in links between these possible identities and strategies for action) is often explicitly 
evoked as the active ingredient in interventions that do not measure possible identities at all (Elliott, Choi, Destin, & Kim, 2011; Lee, 
Husman, Scott, & Eggan-Wiggins, 2015; Rinaldi & Farr, 2018; Wooley et al., 2013). Furthermore, even when not an explicit active 
ingredient, change in possible identity content and link to strategies is often implicit in intervention process assumptions (Ansong 
et al., 2018; Destin & Svoboda, 2017; Lewis & Yates, 2019; Stephens, Hamedani, & Destin, 2014; Stephens, Townsend, Hamedani, 
Destin, & Manzo, 2015). After reviewing the extant literature, we report on the results of our field research. 


1. Do school-focused possible identities change over the course of the school year? 


As reviewed by Oyserman and James (2011), some of the earliest empirical research on the future or possible self focused on who 
school children were trying to become. Early researchers assumed that a taxonomy of development could be deduced from differences 
across gender, age and grade groups. This early taxonomy attempt was largely abandoned. Since then, a large literature suggests that 
though school is a dominant theme, students differ in the extent to which their possible identities are related to school (are “school- 
focused”). However, the literature does suggest that, at any point in time, students with school-focused possible identities attain 
better grades than those who do not and that having school-focused possible identities linked to strategies predicts better subsequent 
grades (for a review, Oyserman & James, 2011).! 

Knowing that academic success is associated with current or prior images of future possible academic success is important. But 
without a sense of whether possible identities are likely to change or remain stable over time, it is hard to know how to interpret these 
results. We looked for, but did not find, studies examining change (without intervention) over the course of the school year in child 
and adolescent school-focused possible identities. We also did not find studies examining whether change (or stability) in school- 
focused possible identities matters for academic trajectories, and whether school context affects the aspects of possible identities that 
matter. Instead, the studies we found examining change in possible identities without intervention focused on the relationship 
between changing health-focused possible identities and health outcomes among elderly and aging individuals (Frazier, Hooker, 
Johnson, & Kaus, 2000; Smith & Freund, 2002). These seemed too far removed from our focus. Moreover, as detailed next, the studies 
we found regarding change in children's, adolescents’, and young adult's school-focused possible identities focused on researcher- 


' Operationalizations differ in whether possible identities are defined by content (e.g., school-focused or relationship-focused content) alone or are 
defined by content in combination with strategies. When operationalized in terms of content, researchers report on counts of school-focused 
identities (Bi & Oyserman, 2015), or counts of ‘balanced’ positive and negative school-focused identities (Oyserman et al., 1995), or code possible 
identities in other ways, such as concreteness (Rathbone, Salgado, Akan, Havelka, &amp; Berntsen, 2016). When operationalized in terms of 
strategies, researchers report on counts of the number of possible identities students say they were “doing something” about (Oyserman & Saltz, 
1993), or counts of the number of student-generated strategies to attain school-focused possible identities (Oyserman et al., 2004), or code strategies 
in other ways, such as a score of the ‘plausibility’ possible identity roadmaps (Oyserman et al., 2004). 
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directed rather than naturally-occurring change and context studies we found did not actually examine differences in possible 
identities across context. 


1.1. Intervening to change possible identities: does changing possible identities affect academic trajectories? 


We found one randomized intervention trial documenting that changing school-focused possible identities affects subsequent 
academic outcomes (Oyserman, Terry, & Bybee, 2006). This study is important because it has a measure of change in possible 
identities and a measure of change in academic outcomes. We found six additional experiments or interventions focused on changing 
possible identities which report measuring possible identities (each in a different way, Carroll, Shepperd, & Arkin, 2009; Kerpelman & 
Pittman, 2001; Kortsch, Kurtines, & Montgomery, 2008; Lee, Husman, Scott, & Eggum-Wilkens, 2015; Oyserman, Terry, & Bybee, 
2002; Stake & Nickens, 2005). None documented that changing possible identities changed academic trajectories. This gap in the 
literature is important because the idea that change in possible identities occurs and matters is, implicitly or explicitly, the basis for a 
large number of interventions taking place in middle schools (Destin & Svoboda, 2017; Woolley, Rose, Orthner, Akos, & Jones-Sanpei, 
2013), high schools (Rinaldi & Farr, 2018), and colleges (Stephens et al., 2014; 2015). 


1.2. How might context matter? 


We looked for studies examining how contexts might shape the relationship between school-focused possible identities and 
students’ academic outcomes. As detailed next, we found studies examining students in high poverty contexts and studies examining 
family social and economic resources. The poverty context studies suggest that having strategies linking school-focused possible 
identities to strategies for action may be particularly important for children living in high poverty contexts (Bi & Oyserman, 2015; 
Oyserman, Bybee, Terry, & Hart-Johnson, 2004; Oyserman, Bybee, & Terry, 2006; 2007). In these studies, having linking strategies, 
not just school-focused possible identities, predicted subsequent grades. All of these studies were in poverty contexts, so it is possible 
that the key finding—that possible identities alone are not enough and that students need linking strategies for action in order for 
possible identities to matter for grades, is context sensitive. It is possible that students in lower resourced contexts need to generate 
their own strategies for action while students in higher resourced contexts just need the school-focused possible identities because 
their schools provide the strategies for action. 

That said, the family social and economic resource studies suggest that families affect whether students generate these linking 
strategies (Oyserman, Brickman, & Rhodes, 2007; Oyserman, Johnson, & James, 2011). Thus, Oyserman et al. (2011) report data 
from a four-state sample, which revealed that family social resources mattered not for whether children had school-focused possible 
identities but for whether these identities were linked to strategies for action. In these data, children from middle-income families 
were more likely to have strategies to work on their school-focused possible identities than were children from low-income families or 
children from families living in low-income neighborhoods. Second, Oyserman et al. (2007) report that low parent involvement with 
school was associated with worse grades for children in the school-as-usual control group but not for children randomly assigned to 
receive Oyserman's identity-based intervention. The implication these authors drew is that low parent involvement in school un- 
dermines the links students make between their school-focused possible identities and strategies for action. 


2. Current studies 
2.1. Study 1 


In Study 1 we address each of the open questions highlighted in our review of the literature: do school-focused possible identities 
(and link to strategies) change over the course of the school year, does change matter by affecting academic outcomes, and does 
school context affect whether students need strategies in addition to possible identities. We focused on students who might be at risk 
of poor academic outcomes due to their minority background and socioeconomic status who attend schools varying in resources. Our 
schools were high poverty and high minority enrollment. We used core grade point average as our academic outcome measure given 
its centrality in discussion of academic outcomes. To increase the usefulness of our results, we used the four operationalizations of 
school-focused possible identities empirically linked to academic outcomes in our review, each operationalization is provided in our 
measure section. 


2.1.1. Sample 

The 8th-grade students (N = 461) attending 7 high-poverty, high-minority-enrollment Chicago public schools participated with 
written parental consent. We excluded students whose parents refused consent (n = 50) or who entered classrooms after consent 
forms were collected (n = 91).” Of the remaining 320 children 40 were missing possible identity data and 33 were missing school 
records, yielding a final sample of n = 247 (55% female, 92% poor, 83% Latinx, 14% African American, 3% White or other race- 
ethnicity). 


? Consent entailed a lengthy process to fit school district requirements. Students who returned a signed form (whether parents signed ‘no’ or ‘yes’) 
were given movie tickets. Once completed, it was too disruptive to the classrooms to start the process again. 
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3. Method 
Students signed assent forms and logged onto an online survey in their classrooms in September and May. 
3.1. Human subjects and power 


We obtained IRB approval (UP-14-00287 University of Southern California). Our planned sample was sufficient to detect a small 
effect (f? =.02; Cohen, 1988) with a power of .80 at p < .05, for which, according to our sensitivity analysis, we needed a sample of 
394. Our sample size of 247 is smaller than planned. However, our effects were larger than the minimum we planned for, with f* 
values ranging from .033 to .051 for our analyses without controls. Post-hoc sensitivity analyses suggest that at these effect sizes, our 
power to detect the effects that we found ranges from .81 to .94. Even with all of our controls added, our effects are larger than 
planned at .028, .036, .039, .041. Post-hoc sensitivity analyses suggest that at these effect sizes, our power to detect the effects that 
we found are powered at 0.74, .85, .87, and .89 respectively at p < .05. Hence, except for the smallest of our effects, we are powered 
to find the effects we found. Full analyses of post hoc effects are found in our Supplemental Materials. 


3.2. Possible identities and linked strategies 


The survey began with the expected and feared possible identity prompts detailed in Fig. 1. The prompts allowed students to 
describe up to four expected (n = 1,527 obtained responses) and four (n = 1,473 obtained responses) feared possible identities and 
asked students to click if they were doing anything about each possible identity. Students were shown each of the possible identities 
they were doing something about and reported for each what they were doing. 

After training to reach 90% agreement, the first author and a research assistant, both blind to other student information, read the 
full possible identity and strategies entry of each student. Working alone, each categorized each student response following the on- 
line coding scheme (https://dornsife.usc.edu/daphna-oyserman/measures/). After coding was complete, agreement was calculated 
and revealed that average agreement was 89.6%. The coders met to discuss to agreement each case of disagreement. 

Table 1 shows the percentage of student responses in each possible identity category from most to least common. The most 
commonly generated possible identity categories were school/achievement (e.g. “on the honor roll” 70% of fall, 56% of spring 
responses), interpersonal relationships (e.g. “bad friends” 10% fall, 18% spring responses), and off-track (e.g. “doing drugs” 11% fall, 
16% spring responses). Fewer than 10% of responses described possible identities focused on personality traits (e.g. “more honest”), 
physical/health (e.g. “thinner”), material/lifestyle (e.g. “making Youtube videos”), or negative responses to the to-be-expected 
prompt (e.g. [I expect to be] “homeless”). 


4. Measures 
4.1. School-focused possible identity count 


The first way that we operationalized school-focused possible identities was as a simple count of the number of school-focused 
possible identities. This did not require any recoding of the content-coded data described above. 


4.2. Balanced school-focused possible identities count 


The second way we operationalized school-focused possible identities was as a compound score, termed ‘balance’ (e.g., Aloise- 
Young, Hennigan, & Leong, 2001; Oyserman, Gant, & Ager, 1995). Balance is a count of the pairs of positive (expected, e.g., “on the 
honor roll”) and negative (feared, e.g., “getting bad grades”) possible identities that are school-focused. To obtain this score, first 
author and a research assistant separately coded balanced pairs and then discussed to agreement any disagreement. 


4.3. School-focused possible identities with strategies 


The third way we operationalized school-focused possible identities was as a count of the number of school-focused possible 
identities that included at least one linked strategy (Oyserman & Saltz, 1993). To obtain this score, the first author and a research 
assistant pulled the content-coded data described above and limited their count to only those possible identities that had been 
content-coded as school-focused and which, on further examination, also had a strategy, that is a response to the last question (what 
are you doing) students were asked about their possible identities. 


4.4. School-focused possible identity plausibility score 


The fourth way we operationalized school-focused possible identities was as a compound score termed ‘plausibility’ (Oyserman, 
Oyserman, Bybee, Terry, & Hart-Johnson, 2004). Plausibility is the that extent school-focused possible identities and linked strategies 
were concrete and detailed enough to provide a roadmap forward. We scored plausibility using the rubric developed by Oyserman 
and colleagues (Oyserman et al., 2004). In this rubric yields scores between 0 = No school-focused possible identities or one vague 
and general school-focused possible identity without any strategies to get there, and 5 = Four or more school-focused possible 


29 


E. Horowitz, et al. Journal of Adolescence 79 (2020) 26-38 


Each of us has some idea of what we might be like in the future. 


First, imagine yourself next year. What do you expect to be like? What do you expect you will be 
doing? 


Second, in the boxes below, write what you expect you will be like and what you expect to be 
doing next year. 


Third, ask yourself if you are doing something to work on this expectation for next year. Click "No" 
if you are not doing something or click "Yes" if you are doing something to work on this 
expectation for next year. 


Am | currently doing 
something to work on this 


expectation? 
No Yes 
Next year | expect to be... 
u Oo 
4 
Next year | expect to be... 
4 
Next year | expect to be... 
U — 
a 
Next year | expect to be... 
Oo Vv 


4 


In addition to expectations, we all have some idea of what we don't want to be like or things we do 
not want to be doing. 


First, think about what you do not want to be like or things you do not want to be doing next year-- 
things you are concerned about or want to avoid. 


Second, write those things you want to avoid being like or doing next year in the lines below. 


Third, ask yourself if you are currently doing something to avoid each thing next year. Click "No" if 
you are not currently doing something to avoid each thing or click “Yes” if you are currently doing 


something so this will not happen next year. 


Next year, | want to avoid... 
Next year, | want to avoid... 
Next year, | want to avoid... 


Next year, | want to avoid... 


What are you doing now to be [Possible Identity] next year? 


4 


What are you doing now to reduce the chances that [Possible Identity] will describe you next year? 


Fig. 1. Full instructions for possible identity and strategies, adapted from Oyserman et al., 2006. 


identities with four or more linked strategies with at least one of these strategies focusing on interpersonal aspects of school context 
(e.g., getting along with teachers). We detail this coding rubric in Table 2. The first author coded plausibility and to obtain a 
reliability score, a research assistant double-coded a random sample of 10% responses. Our Cohen's Kappa (Fleiss & Cohen, 1973) was 
0.85, reflecting substantial agreement following Landis and Koch's (1977) Cohen's Kappa rule-of-thumb that scores between 0.61 and 
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Table 1 
Percentage of student possible identity responses in the fall and spring reflecting each pos- 
sible identity domain. 


Possible Identity Domain Fall Spring 
School-focused 70% 56% 
Off-track 11% 16% 
Interpersonal Relationships 10% 18% 
Material /Lifestyle 5% 3% 
Personality Traits 2% 3% 
Health/Physical 2% 3% 
Negative but expected <1% <1% 


Note: School-focused include possible identities focused on school and on achievement. 


Table 2 
Coding school-focused possible identity plausibility, adapted from Oyserman et al., 2004. 


Plausibility Score Based on Explanation of Score 


Count school-focused possible Count SFPI linked 


identities (SFPI) strategies 
0 0 0 O SFPI 
1 0 OR 1 vague or general SFPI AND 0 SFPI strategies 
1 1 1 1 SFPI and 1 SFPI strategy 
2 0 OR 2 SFPI but no SFPI strategies 
2 1 2* or more 1 SFPI and 2 or more SFPI strategies* 
2 1-2 OR 2 SFPI and 1- 2 SFPI strategies 
3 0* or 1 OR 3 SFPI and 0*-1 SFPI strategies 
4 or more 0 OR 4 or more SFPI and 0 SFPI strategies 
3 2 3* or more 2 SFPI and 3 or more SFPI strategies* 
3 2or3 OR 3 SFPI and 2-3 SFPI strategies 
4 or more 1* or 2 OR 4 or more SFPI and 1*-2 SFPI strategies 
4 3 4 or more 3 SFPI and 4 or more SFPI 
4 or more 2*,30r 4 OR 4 SFPI and 2*-4 SFPI strategies 
5 4 or more 4 to 5 or more 4 or more SFPI AND 4 or more strategies AND at least one strategy for an SFPI 


is focused on interpersonal aspects of school context. 


Note: Include all possible identity content, including responses to expected and feared possible identity probes. Codes noted with * mean code at this 
level only if at least one of the possible identities and/or strategies that are provided are detailed or concrete, that is if specific action is implied and 
possible identities are not redundant, otherwise code at the next lower level of plausibility. 


0.8 reflect substantial agreement. Because small disagreements are qualitatively different from large disagreement, we also calculated 
a weighted Cohen's Kappa in which small disagreements (codes that differed by a single point) were weighted to reflect 80% 
agreement. This yielded a Cohen's Kappa of .96. An alternative form of reliability, percentage agreement, yielded a score of 88%. No 
interrater disagreement was larger than a single point on the 0-5 plausibility scale. 


4.5. Student demographics and academic outcomes 


Chicago Public Schools provided 6th, 7th, and 8th-grade course grades, student gender, free/reduced lunch status (poverty), and 
racial-ethnic heritage as part of a data sharing agreement with the American Institutes for Research. We computed final 6th, 7th, and 
8th-grade core grade point average (GPA) by computing the average final grades in Math, Science, English, History, and Social 
Studies (0 = F, 1 = D,2 = C,3 = B, 4 = A). 


4.6. School-level context 


We calculated each school's student-to-teacher ratio, percentage of students in poverty, and percentage of students who identified 
as Latinx or African American using the Common Core of Data (https://nces.ed.gov/ccd/pubschuniv.asp). 


5. Analysis plan 
5.1. Testing whether school-focused possible identities change 


We used four paired t-tests (September, May) to test whether possible identities changed over the course of the school year for 
each of the four school-focused possible identity scores. 
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5.2. Testing effects of change in school-focused possible identities on academic outcomes, and whether effects context-dependent 


We used hierarchical multiple regression equations to test our prediction that trajectories of change in school-focused possible 
identities affect trajectories of change in academic outcomes. First, we created a change score for each of the four school-focused 
possible identity scores by regressing May scores on September scores and saving the residuals. Second, we standardized these change 
scores and student GPA. We chose residualized change scores rather than raw change scores because this method is better at ac- 
counting for differences in expected changes that result from differences in baseline (September) possible identity scores. Third, we 
created four sets of hierarchical multiple regression models for each of the four possible identity scores. In each model, 8th-grade core 
GPA was our dependent variable. 

In the first model for each possible identity score we entered our residualized school-focused possible identity change score. In the 
second model we included dummy codes for six schools to control for school effects. In the third model we added 6th and 7th grade 
core GPA to control for prior academic performance. In the fourth model, we entered contrast codes for gender, for being Latinx, and 
for receiving free or reduced priced lunch in order to control for student-level characteristics. 

We ran a series of follow-up regression models to test how specific aspects of the school context might influence the effect of 
possible identities. In these models, instead of entering school dummy codes at the second step, we entered school-level measures of 
Latinx ethnicity, free or reduced priced lunch eligibility, and student-teacher ratios. 


6. Preliminary analyses 


Before conducting our planned analyses, we conducted preliminary analyses which are detailed in our on-line Supplemental 
materials. In these analyses, we examined the association between child-level demographics and school-level variables and possible 
identity and academic outcome variables to ascertain which demographic and school-level variables should be included as controls. 
In brief, we found that being a girl was positively associated with (better) grades and with (positive) change in count of school- 
focused possible identities, balance, and plausibility of school-focused possible identities. We found less consistent negative effects of 
race (being Latinx, going to a Latinx-dominated school), poverty (being poor, going to a school with many other poor children) on 
grades, and of (larger) student-teacher ratios on (worse) grades. Only race was (negatively) associated change in count of school- 
focused possible identities. Hence, we include individual demographics and school-context variables as detailed in our analyses 
below. 


7. Results and discussion 


Do school-focused possible identities change over the course of the school year? Yes, on average, school-focused possible 
identities do change—they tend to decline. But this average decline conceals individual differences. Table 3 presents September-May 
paired t-tests with 95% confidence intervals and Table 4 presents the percentage of students experiencing stability, a decrease, or an 
increase in school-focused possible identities. As can be seen, school-focused possible identity scores decline on average by a small 
but significant amount (d = -.25 balance, d = -.17 count of school-focused possible identities with strategies, d = -.18 school-focused 
roadmap plausibility score, and d = -.11 count of school-focused possible identities). However, this decline is not found in all 
students—school-focused possible identity scores remain stable among nearly a quarter of students and increase among nearly a third 
of students. 

Do changes in school-focused possible identities predict academic outcomes and are effects context-dependent? Change 
in school-focused possible identity scores matter, significantly predicting 8th-grade Core GPA. When school-focused possible iden- 
tities decline so do grades, and the reverse is true when school-focused possible identity scores increase. These results are detailed in 
Table 5. Moreover, change in possible identity scores matters even when controlling for school, prior core GPA, and individual 
demographics. As can be seen in Table 5, Model 1, possible identity plausibility score is the strongest predictor of 8th-grade GPA, 
explaining at least 25% more variance in 8th-grade core GPA than any of the other possible identity scores. 

However, once we take school-context effects into account, the possible identity metrics do not differ meaningfully in how well 
they predict GPA. This suggests that schools may differ in the extent to which they help students self-regulate and craft strategies to 


Table 3 

Mean (SD), 95% confidence intervals (CI), and paired-sample t-test results for change in school-focused possible identity metrics. 
Measure of School-Focused Possible Identities Fall Spring t-test of Change (df = 246) 95% CI of Change Pp 

M (SD) M (SD) 

Simple Count (0 = none, 8 = all) 3.86 (2.09) 3.59 (1.85) -.1748 -.586, .035 -082 
Balance Count (0 = none, 4 = all) 1.35 (1.04) 1.05 (0.95) — 3.869 -.458, —.149 -000 
With Strategies Count (0 = none, 8 = all) 3.42 (2.12) 3.00 (1.91) — 2.614 -.738, —.104 .009 
Plausibility Score (0 = none, 5 = all) 3.54 (1.49) 3.20 (1.58) — 2.759 -.583, —.097 .006 


Note: Count scores range from a theoretical minimum of 0 to a theoretical maximum of 8, balance relies on pairs of possible identities so the 
theoretical maximum is 4, and plausibility is a score with a theoretical maximum of 5. Note, p-values do not correct for multiple comparisons 
because our goal was to ask if authors using the different scoring methods would have found different results. 
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Table 4 

Percentage of participants who had an increase, no change, or a decrease in their school-focused possible identity scores. 
Measure of School-Focused Possible Identities Increase No Change Decrease 
Simple Count of School-Focused Possible Identities 34% 19% 47% 
‘Balanced’ School-Focused Possible Identities Count 23% 35% 42% 
Count of School-Focused Possible Identities With Strategies 34% 16% 49% 
School-Focused Possible Identities Plausibility Score 30% 28% 42% 
AVERAGE ACROSS MEASURES 30% 25% 45% 

Table 5 


Effect of change in school-focused possible identity scores on 8th-grade core GPA. 


Predictor Model 1 Model 2 Model 3 Model 4 


B 95% CI p B 95% CI p B 95% CI p B 95% CI p 


School-focused possible identity count -190 .066, .314 .003 .210 .099, .322 .000 .089 .032,.145 .002 .087 .030, .145 .003 
School-focused possible identity balance count -178 .054, .302 .005 .207 .097,.318 .000 .085 .028,.141 .003 .084 .027,.140 .004 
School-focused possible identities with strategies count .183 .059,.307 .004 .179 .066,.291 .002 .090 .033,.146 .002 .089 .033,.146 .002 
School-focused possible identities plausibility score -220 .097, .343 .001 .211 .099, .323 .000 .078 .021,.135 .008 .075 .017,.133 .011 


Notes: In each model the dependent variable is 8th-grade core grade point average; Model 1 includes no additional predictors; Model 2 includes 
school dummy codes; Model 3 includes prior core GPA in 6th and 7th grades; Model 4 includes student-level demographic and poverty data; When B 
is positive, the possible identity predictor is associated with higher core grade point average in 8th-grade. 


attain their school-focused possible identities. Some schools may provide opportunities to engage in strategies for anyone with 
sufficient school-focused possible identities, other schools do not. These results are detailed in Table 5, Model 2, which shows that 
once school context is accounted for, the lower end of the 95% confidence intervals jumps up for the regression coefficients of the two 
scores that do not account for strategies so that change in plausibility score is no longer a better predictor of 8th-grade core GPA. 

As can be seen in Table 5, Models 3 and 4, change in school-focused possible identities remains a significant predictor of 8th-grade 
core GPA even after prior academic attainment and demographics are accounted for. Indeed, the real-world effect of change in 
school-focused possible identities is meaningful when considering the size of the effect of demographics on grade trajectories. In our 
sample, demographics (gender, poverty, and being Latinx) explained 11.9% of the variance in 8th-grade core GPA. These factors are 
either not changeable or largely outside the purview of intervention. Changing school-focused possible identities is not as difficult 
and explains about 3.6% of the variance in 8th-grade core GPA. After controlling for prior core GPA, school, and individual poverty 
and demographics, change in the count of students’ school-focused possible identities still explains about 3.8% of the remaining 
variance in 8th-grade core GPA. This strongly suggests that possible identities are a unique source of academic motivation, not 
redundant with prior academic success or social-economic factors. 

We conducted follow-up analyses with models controlling for specific aspects of the school context. These analyses reveal that 
change in possible identities matters even after controlling for these aspects of school context. Specifically, these results (detailed in 
Model 5 in Table 6) show robust effects on 8th-grade core GPA for each of our possible identity scores. That is, even the few school- 
level resource factors we can assess matter, shifting up the relative predictive power of the simplest count score relative to the more 
complex plausibility score. However, as detailed in our on-line Supplemental Materials Table S4, we found no evidence of a context 
by possible identity score interaction (though we are underpowered to find such an effect). 

Our core Study 1 findings are that school-focused possible identities typically change over the course of the school year and this 
change predicts end-of-year academic outcomes, even controlling for past academic performance, demographics, and school factors. 
School-focused possible identities do not change for everyone, but when they do, it matters. If school-focused possible identities 
decline, grades go down, and if school-focused possible identities increase, grades go up. This is an important finding for develop- 
mental researchers and good news for interventionists who had assumed this to be the case. 


Table 6 

Effect of change in possible identity scores on 8th-grade Core GPA, controlling for school level context. 
Predictor Model 5 

B 95% CI Dp 

School-focused possible identity count 222 101, .343 -000 
School-focused possible identity balance count -205 .084, .326 -001 
School-focused possible identities with strategies count -206 085, .327 -001 
School-focused possible identities plausibility score -243 123, .362 000 


Notes: The dependent variable is 8th-grade core grade point average; Model 5 controls for three measures of school context: (less) school-level free 
and reduced price lunch rate, (higher) student-teacher ratio, and (smaller) percentage of students in the school that identify as Latinx. 
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7.1. Study 2 


Study 2 addresses the obstacle to using these results in future studies: Coding open-ended textual data is time-consuming, resource 
intensive, and may not be possible in scaled up research or interventions. So, to create an economical and feasible alternative, we 
developed a machine-learning classification algorithm that could computationally code the content of students’ possible identities. 
Machine-learning classification algorithms acquire their knowledge from source data, learning how things are classified from ex- 
amples of previously classified data—in this case, possible identity responses whose content has been coded by researchers. The 
algorithm is then used to classify new cases (whose classification is unknown). To be effective, training data sets need to contain 
sufficiently rich content. The literature on machine learning algorithms for classifying real world categories mostly entail samples of a 
thousand or more (e.g., Chang, Chen, Chang, Chung, & Lai, 2012; Kessler et al., 2016; Nelson et al., 2012). The “training” data is used 
to develop and test an algorithm, which optimally (as in our case) can then be used in a completely different sample. We provide our 
Python code and details of the development and testing process in our online Supplemental Materials. 


7.1.1. Sample 

Our algorithm training sample (n = 1146) was separate from our Study 1 sample. It included 8th-graders from 10 Chicago Public 
Schools, as in Study 1, children were mostly Latinx or African American and the schools enrolled mostly children from low income 
families as detailed next. As in Study 1, we excluded children whose parents refused consent (n = 58), who entered school after 
consent forms were collected (n = 75) or who were missing possible identity data (n = 7). We retained the remaining 85.91% 
(n = 1006, 52% female, 87% free/reduced price lunch, 65% Latinx, 18% African American, 12% White, 4% Asian, 1% other race- 
ethnicity) for algorithm development. 


8. Method 
8.1. Data collection and coding 


We used the same methods as in Study 1, collecting data in September and May, coding n = 6189 expected and n = 6060 feared 
possible identity responses. We (first author, two research assistants) double-coded 80% of all responses, discussing disagreements to 
agreements. The first author coded the remaining 20% of responses and a research assistant double coded a subset of 20% of these 
responses (89.55% agreement before discussion). Table 7 shows the distribution of responses. 


8.2. Training the classifier 


We trained a classifier on our researcher-coded responses as follows. First, we combined possible identity and strategy text for a 
given response into a single block of text as in Study 1. Second, we trained separate classifiers for expected and feared possible 
identities because the language children use to describe expected and feared possible identities is quite distinct. Third, we trained on 
the three categories (“school-focused”, “off-track”, and “interpersonal”) that each had at least 10% of all responses and added an 
“other” category for the remaining responses. 

Our algorithm used linear support vector machines (SVM; Joachims, 1998) and Distributed Dictionary Representations (DDR; 
Garten et al., 2018; Hoover, Johnson, Boghrati, Graham, & Dehghani, 2018). With DDR, words are represented as points in a “low- 
dimensional” space (generally 10-2000 dimensions) based on their locations relative to other words in real world texts. Our word 
embeddings in this space were based on a large corpus of Google News articles (Word2Vec, Mikolov et al, 2013). Constructs (e.g., 
school-focused possible identity) are represented in this space based the locations of the words coded as being associated with these 
constructs. Following Garten, Sagae, Ustun, and Dehghani (2015) we generated a response-level representation of each written 
possible identity and strategy response and spatial representations of each possible identity category (school-focused, off-track, 
interpersonal, and “other”) based on the locations of the responses we coded as belonging to each category. We used this location to 
train an SVM classifier, which attempted to classify new, uncoded responses (e.g., skip classes) based on the location of the response 
in space relative to the locations of already coded representations of each category of possible identity. We evaluated how well our 
classifier did at coding possible identity responses into school-focused, off-track, interpersonal, and “other” using 10-fold cross 


Table 7 
Content of possible selves: Percentage of responses in each domain in algorithm development 
sample (as coded by researchers). 


Content Domain Development Sample Fall and Spring Combined 
School/Achievement 59% 

Off-track 16% 

Interpersonal Relationships 14% 

Material/Lifestyle 4% 

Personality Traits 3% 

Health/Physical 3% 

Negative but expected <1% 
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Table 8 
Coding the Opening Example: of Researcher Codes vs. Machine Codes. 
Student Open-Ended Response Code 
Possible Identity Strategy Researcher Machine 
helpful Whenever my friends ask me a question about the homework or about a class assignment and try to Interpersonal School 
explain more. 
paying more attention This year I have been taking more notes and writing down anything I need to help me understand School School 
what we are working on. 
unstoppable I'm trying to do anything I possibly can to get the best grades and to graduate. Next year I will do the School School 
same and no one will stop me from achieving my goals. 
less distracted I'm focusing more on my work and asking any questions I need to. School School 
trouble Instead of for example laughing in class like I always do, I will do my best to be more serious and do School School 
nothing bad. 
detention I have never got detention and I will still try to avoid that because I do not want that affecting me. School School 
fights Instead of arguing with my friends I will try to reason with them and not let anything get physical and Off-track Interpersonal 
no matter what anyone says to me I'll talk to them instead of being physical. 
bad grades I have gotten bad grades last year and I know that really affected me and my future so this year and School School 
next year I will do my best to get good grades. 
successful Focusing on my classwork and focusing on how to improve my grades School School 
peaceful - Personality Interpersonal 
chaos Ignore people who want to start something Off-track School 
fakes Ignore all fake people, just say hi and bye Interpersonal Interpersonal 
drama Ignore all rumors Interpersonal _ Interpersonal 
fights Make the right choice and step off Off-track Off-track 
Note: - = no response, no code. For this particular student, the researcher and machine code disagreed four times. The 1st response the machine 


coded as school likely while the human coders focused on interpersonal because the discussion of homework was in the context of helping a friend. A 
second disagreement came in the 7th response. Here the algorithm gave more weight to the strategy, which mentioned friends, rather than the 
“fight” possible identity, which researchers code as off-track. A third disagreement came in the 10th response — given only a single unique and 
rarely-used word (“peaceful”), the algorithm essentially had to guess. The fourth disagreeemnt came in the 13th response. Here the possible identity 
was another unique and rarely used word (“chaos”), and without many other keywords, the algorithm again had to take a somewhat uneducated 
guess. 


validation. That is, the full set of training data was partitioned into 10 subsets with each used to test the accuracy of an algorithm 
trained on the other nine subsets. The final result, an overall classification accuracy based on the average of all 10 tests, is 89.18% for 
expected possible identity responses and 85.64% for feared possible identity responses. To concretize what the method looks like, 
Table 8 provides an example of machine-coding and researcher coding of the possible identities of the student quoted at the be- 
ginning of the paper. 


9. Results 


Preliminary analyses. We used our algorithm to classify Study 1 school-focused possible identity responses. This tested if a 
machine-coded version of our school-focused possible identity score could be used in situations in which human coding is too 
expensive and burdensome. We found that the school-focused, interpersonal, off-track, or “other” algorithm-generated classifications 
matched our researcher coding for expected (90.9% match) and feared (88.3% match) possible identities. Moreover, mean machine- 
coded school-focused possible identity count score showed the same Fall (M = 3.95, SD = 2.00) to Spring (M = 3.78, SD = 1.89) 
decline we saw in our researcher coding, paired t-test, t = —1.09, 95% CI [-.477, .137], p = .276, d=.069. 

Do changes in machine-coded school-focused possible identities predict academic outcomes? Machine-coding works. As 
Table 9 details, change in our machine-coded possible identity metric—a residualized changes score, as in Study 1—predicted end of 


Table 9 
Effect of change in machine-coded school-focused possible identities on 8th-grade Core GPA. 


Predictor Model 6 Model 7 Model 8 Model 9 


B  95%CI p B 95% CI p B  95%CI p B 95% CI p 


Machine-coded school-focused possible identity score .162 .038,.287 .011 .155 .042,.267 .007 .073 .016,.129 .012 .073 .016,.131 .012 


Notes: The dependent variable in each model is 8th-grade core grade point average and the predictor is machine-coded school-focused possible 
identity school; Model 6 includes no additional predictors; Model 7 includes school dummy codes as the additional predictor; Model 8 includes the 
students' own prior core GPA in 6th and 7th grades in addition to school dummy codes as the additional predictors; Model 9 includes student-level 
demographic and poverty data in addition to prior core GPA as the additional predictors; When B is positive, the possible identity predictor is 
associated with higher core grade point average in 8th-grade. 
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8th grade core GPA, even after controlling for school, demographics, and prior academic performance. Specifically, we used the four 
regression models from Study 1 (detailed in Table 5) to test if change in machine-coded possible identities predicts 8th-grade core 
GPA (Model 6), and if effects remain after adding dummy codes for school (Model 7), after adding 6th and 7th-grade core GPA (Model 
8), and after adding student-level demographics and poverty (Model 9). 


10. Discussion 


Our review highlighted three open questions and an obstacle to progress in the possible identity literature. In Study 1 we ad- 
dressed the open questions, which are: do school-focused possible identities remain stable or change over the course of the school 
year, does change matter for school grades, and does school context matter for whether students need to have strategies to attain their 
school-focused possible identities? Then, in Study 2 we addressed the obstacle to progress, which is that coding open ended responses 
is difficult to do at scale, by developing a machine learning algorithm to code open-ended possible identities data. 

We found that most students experienced change in their school-focused possible identities over the course of the school year and 
that change mattered, affecting students! core grade point average even when controlling for two prior years of grades. When school- 
focused possible identities declined so did grades, and when school-focused possible identities increased, so did grades. Each of the 
possible identity scores we tested were useful predictors of 8th-grade academic outcomes, including our machine-coded version. We 
found that change in school-focused possible identities is consequential; variance explained by change in student's school-focused 
possible identities is about a third of the variance explained by their gender, poverty, and race-ethnicity combined. We also found 
that school context matters. In some school contexts, not others, strategies add to the motivational force of school-focused possible 
identities. It might be that strategies matter less in higher resources contexts, but future research is needed to test this prediction. 

Our results also imply that interventions targeting changing possible identities and strategies should take into account that 
varying proportions of students might otherwise have stable, declining, or increasing school-focused possible identities. This het- 
erogeneity might imply that different subgroups of students are receptive to different kinds of intervention activities. It may also be 
good news for group-based interventions if interveners can leverage this to foster a norm of having and developing school-focused 
possible identities and linked strategies. 

That said, we focus on a particular time phase, the last year of middle school, such that students are asked to imagine their 
transition to high school. Most of our participants are 13-14 years of age. Each of these aspects of development is likely to matter. 
Indeed, it would be hard to argue that development does not matter for the content of possible identities—to do so would be to 
suggest that age, pubertal development, changing societal expectations, and the acquisition of adult social roles do not matter. In our 
own lab (O'Donnell & Oyserman, 2019), we are examining each of these factors, finding that each does matter. Specifically, students 
who are closer to high school completion have fewer school-focused possible identities—it is as if they are counting down to the end 
of high school when school will no longer matter. In contrast, greater pubertal development is associated with more school-focused 
possible identities—it is as if they are experiencing adulthood as closer and needing to get going on adult roles. Regarding stability, it 
might be that declines in school-focused possible identities are steeper in the middle school years, with past academic outcomes 
narrowing students’ sense that change is possible in the high school years. 

The accuracy of our machine learning classification algorithm is also good news because coding open-ended responses is a 
stumbling block for researchers, especially those who wish to use large data sets and evaluate scaled up interventions. We share our 
newly developed machine-coding algorithm code so that others can use it in their own research. The implication is that researchers 
wanting to study possible identity effects at scale can use an ideographic measure that allows students to express their possible 
identities and quantify results without costly coding. 

Any study, of course, has limitations. Here, we consider three: lack of experimental control, limited access to school context 
variables, and use of a single geographic region (Chicago). First, with regard to experimental control, our results do address an 
important gap regarding the temporal stability of possible identities and the consequences of change—we document temporal change 
in possible identities over the course of the school year and effects of this change on change in academic trajectories. However, we 
cannot make causal claims because even though we controlled for two prior years of academic attainment, child-level demographics, 
and school-level factors, we did not manipulate change in possible identities. Hence, our research is informative of developmental 
trajectories, not of causal processes. 

Second, with regard to school context variables, our study obtained data from seven schools, allowing us to begin to test school 
context effects. However, our school-level variables were limited. That meant that although we could document that school context 
matters, we were not powered to fully unpack why. Future research is needed to better understand what about schools differs, such 
that in some schools students need their own roadmap (the strategies to get going and keep on track), while in other schools someone 
else can provide directions as long as students have the school focused possible identities. Our hunch is that in some schools, parents, 
teachers, and classmates are able to provide students with needed directions, while in other schools, students need to carry their own 
roadmap with them. 

Third, with regard to geographic region, our study documented effects in one geographic region, Chicago, and developed a 
machine coding algorithm from students in the same large urban school district. While these are important first steps, future research 
is needed to test the stability of our results in different settings and the ability of our algorithm to code responses from students in 
different regions, schools, age ranges, and settings. 

Irrespective of these limitations, our results are important because they provide evidence that changes in school-focused possible 
identities and strategies over the school year change students’ academic trajectories. These results are congruent with the theorized 
but typically not tested process of behavior change underlying numerous interventions. Our results document that developmental 
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trajectories in school-focused possible identities are heterogeneous, implying that interventions should take this into account. This 
could be done by learning more about the situations that predict downward rather than upward or stable trajectories or by learning 
more about the forces interventions can harness to stabilize positive and turn around negative trajectories. 
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